video generator
DHS is using Google and Adobe AI to make videos
Immigration agencies have been flooding social media with bizarre, seemingly AI-generated content. We now know more about what might be making it. The US Department of Homeland Security is using AI video generators from Google and Adobe to make and edit content shared with the public, a new document reveals. It comes as immigration agencies have flooded social media with content to support President Trump's mass deportation agenda--some of which appears to be made with AI--and as workers in tech have put pressure on their employers to denounce the agencies' activities. The document, released on Wednesday, provides an inventory of which commercial AI tools DHS uses for tasks ranging from generating drafts of documents to managing cybersecurity. In a section about "editing images, videos or other public affairs materials using AI," it reveals for the first time that DHS is using Google's Veo 3 video generator and Adobe Firefly, estimating that the agency has between 100 and 1,000 licenses for the tools.
- North America > United States > Massachusetts (0.05)
- Asia > China (0.05)
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Li, Zizhang, Yu, Hong-Xing, Liu, Wei, Yang, Yin, Herrmann, Charles, Wetzstein, Gordon, Wu, Jiajun
WonderPlay is a novel framework integrating physics simulation with video generation for generating action-conditioned dynamic 3D scenes from a single image. While prior works are restricted to rigid body or simple elastic dynamics, WonderPlay features a hybrid generative simulator to synthesize a wide range of 3D dynamics. The hybrid generative simulator first uses a physics solver to simulate coarse 3D dynamics, which subsequently conditions a video generator to produce a video with finer, more realistic motion. The generated video is then used to update the simulated dynamic 3D scene, closing the loop between the physics solver and the video generator. This approach enables intuitive user control to be combined with the accurate dynamics of physics-based simulators and the expressivity of diffusion-based video generators. Experimental results demonstrate that WonderPlay enables users to interact with various scenes of diverse content, including cloth, sand, snow, liquid, smoke, elastic, and rigid bodies -- all using a single image input. Code will be made public. Project website: https://kyleleey.github.io/WonderPlay/
- North America > United States > Utah (0.04)
- Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
Gu, Jiatao, Shen, Ying, Chen, Tianrong, Dinh, Laurent, Wang, Yuyang, Bautista, Miguel Angel, Berthelot, David, Susskind, Josh, Zhai, Shuangfei
Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting STARFlow-V, a normalizing flow-based video generator with substantial benefits such as end-to-end learning, robust causal prediction, and native likelihood estimation. Building upon the recently proposed STARFlow, STARFlow-V operates in the spatiotemporal latent space with a global-local architecture which restricts causal dependencies to a global latent space while preserving rich local within-frame interactions. This eases error accumulation over time, a common pitfall of standard autoregressive diffusion model generation. Additionally, we propose flow-score matching, which equips the model with a light-weight causal denoiser to improve the video generation consistency in an autoregressive fashion. To improve the sampling efficiency, STARFlow-V employs a video-aware Jacobi iteration scheme that recasts inner updates as parallelizable iterations without breaking causality. Thanks to the invertible structure, the same model can natively support text-to-video, image-to-video as well as video-to-video generation tasks. Empirically, STARFlow-V achieves strong visual fidelity and temporal consistency with practical sampling throughput relative to diffusion-based baselines. These results present the first evidence, to our knowledge, that NFs are capable of high-quality autoregressive video generation, establishing them as a promising research direction for building world models. Code and generated samples are available at https://github.com/apple/ml-starflow.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
OpenAI launch of video app Sora plagued by violent and racist images: 'The guardrails are not real'
'In a video documented by 404 Media, SpongeBob was dressed like Adolf Hitler.' 'In a video documented by 404 Media, SpongeBob was dressed like Adolf Hitler.' OpenAI launch of video app Sora plagued by violent and racist images: 'The guardrails are not real' OpenAI launched the latest iteration of its artificial intelligence-powered video generator on Tuesday, adding a social feed that allows people to share their realistic videos. OpenAI's own terms of service for Sora as well as ChatGPT's image or text generation prohibit content that "promotes violence" or, more broadly, "causes harm". In prompts and clips reviewed by the Guardian, Sora generated several videos of bomb and mass-shooting scares, with panicked people screaming and running across college campuses and in crowded places like New York's Grand Central Station. Other prompts created scenes from war zones in Gaza and Myanmar, where children fabricated by AI spoke about their homes being burned. One video with the prompt "Ethiopia footage civil war news style" had a reporter in a bulletproof vest speaking into a microphone saying the government and rebel forces were exchanging fire in residential neighborhoods.
- North America > United States > New York (0.25)
- Asia > Myanmar (0.25)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.25)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
Boduljak, Gabrijel, Karazija, Laurynas, Laina, Iro, Rupprecht, Christian, Vedaldi, Andrea
We consider the problem of forecasting motion from a single image, i.e., predicting how objects in the world are likely to move, without the ability to observe other parameters such as the object velocities or the forces applied to them. We formulate this task as conditional generation of dense trajectory grids with a model that closely follows the architecture of modern video generators but outputs motion trajectories instead of pixels. This approach captures scene-wide dynamics and uncertainty, yielding more accurate and diverse predictions than prior regressors and generators. We extensively evaluate our method on simulated data, demonstrate its effectiveness on downstream applications such as robotics, and show promising accuracy on real-world intuitive physics datasets. Although recent state-of-the-art video generators are often regarded as world models, we show that they struggle with forecasting motion from a single image, even in simple physical scenarios such as falling blocks or mechanical object interactions, despite fine-tuning on such data. We show that this limitation arises from the overhead of generating pixels rather than directly modeling motion.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
I Went to an AI Film Festival Screening and Left With More Questions Than Answers
Last year, filmmaker Paul Schrader--the director of Blue Collar, American Gigolo, and First Reformed, and writer of Martin Scorsese's Taxi Driver--issued what seemed like the last word on artificial intelligence in Hollywood filmmaking. A few days after the release of Denis Villeneuve's sci-fi blockbuster Dune: Part Two, Schrader asked his Facebook followers: "Will Dune 3 be made by AI? And, if it is, how will we know?" Schrader is well regarded not only as a director, but one of cinema's top-shelf curmudgeons, quick with a wry burn or baiting shit-post. But his Dune tweet seemed like more than another provocation. It spoke to a mounting feeling among many filmgoers, myself included: that Hollywood had stooped to producing sleek, antiseptic images so devoid of personality that they might as well have been made not by a living, breathing, thinking, feeling artist, but by a computer.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Sora, OpenAI's video generator, has hit the UK. It's obvious why creatives are worried
If you want to know why Tyler Perry put an 800m ( 635m) expansion of his studio complex on hold, type "two people in a living room in the mountains" into OpenAI's video generation tool. The result from artificial intelligence-powered Sora, which was released in the UK and Europe on Friday, indicates why the US TV and film mogul paused his plans. Perry said last year after seeing previews of Sora that if he wanted to produce that mountain shot, he may not need to build sets on location or on his lot. "I can sit in an office and do this with a computer, which is shocking to me," he said. The result from a simple text prompt is only five seconds long – you can go to up to 20 seconds and also stitch together much longer videos from the tool – and the "actors" display telltale problems with their hands (a common problem with AI tools).
- Europe > United Kingdom (0.53)
- North America > United States (0.05)
- Media > Film (0.51)
- Leisure & Entertainment (0.51)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.79)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.79)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.76)
Embodiment-Agnostic Action Planning via Object-Part Scene Flow
Tang, Weiliang, Pan, Jia-Hui, Zhan, Wei, Zhou, Jianshu, Yao, Huaxiu, Liu, Yun-Hui, Tomizuka, Masayoshi, Ding, Mingyu, Fu, Chi-Wing
Observing that the key for robotic action planning is to understand the target-object motion when its associated part is manipulated by the end effector, we propose to generate the 3D object-part scene flow and extract its transformations to solve the action trajectories for diverse embodiments. The advantage of our approach is that it derives the robot action explicitly from object motion prediction, yielding a more robust policy by understanding the object motions. Also, beyond policies trained on embodiment-centric data, our method is embodiment-agnostic, generalizable across diverse embodiments, and being able to learn from human demonstrations. Our method comprises three components: an object-part predictor to locate the part for the end effector to manipulate, an RGBD video generator to predict future RGBD videos, and a trajectory planner to extract embodiment-agnostic transformation sequences and solve the trajectory for diverse embodiments. Trained on videos even without trajectory data, our method still outperforms existing works significantly by 27.7% and 26.2% on the prevailing virtual environments MetaWorld and Franka-Kitchen, respectively. Furthermore, we conducted real-world experiments, showing that our policy, trained only with human demonstration, can be deployed to various embodiments.
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
Yang, Tao, Shi, Yangming, Huang, Yunwen, Chen, Feng, Zheng, Yin, Zhang, Lei
Text-to-video (T2V) generation has gained significant attention due to its wide applications to video generation, editing, enhancement and translation, \etc. However, high-quality (HQ) video synthesis is extremely challenging because of the diverse and complex motions existed in real world. Most existing works struggle to address this problem by collecting large-scale HQ videos, which are inaccessible to the community. In this work, we show that publicly available limited and low-quality (LQ) data are sufficient to train a HQ video generator without recaptioning or finetuning. We factorize the whole T2V generation process into two steps: generating an image conditioned on a highly descriptive caption, and synthesizing the video conditioned on the generated image and a concise caption of motion details. Specifically, we present \emph{Factorized-Dreamer}, a factorized spatiotemporal framework with several critical designs for T2V generation, including an adapter to combine text and image embeddings, a pixel-aware cross attention module to capture pixel-level image information, a T5 text encoder to better understand motion description, and a PredictNet to supervise optical flows. We further present a noise schedule, which plays a key role in ensuring the quality and stability of video generation. Our model lowers the requirements in detailed captions and HQ videos, and can be directly trained on limited LQ datasets with noisy and brief captions such as WebVid-10M, largely alleviating the cost to collect large-scale HQ video-text pairs. Extensive experiments in a variety of T2V and image-to-video generation tasks demonstrate the effectiveness of our proposed Factorized-Dreamer. Our source codes are available at \url{https://github.com/yangxy/Factorized-Dreamer/}.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Li, Ruining, Zheng, Chuanxia, Rupprecht, Christian, Vedaldi, Andrea
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories (i.e., drags), Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions. This is achieved by fine-tuning a large-scale pre-trained video diffusion model, for which we propose a new conditioning architecture to inject the dragging control effectively. More importantly, we introduce the all-to-first attention mechanism, a drop-in replacement for the widely adopted spatial attention modules, which significantly improves generation quality by addressing the appearance and background issues in existing models. Unlike other motion-conditioned video generators that are trained on in-the-wild videos and mostly move an entire object, Puppet-Master is learned from Objaverse-Animation-HQ, a new dataset of curated part-level motion clips. We propose a strategy to automatically filter out sub-optimal animations and augment the synthetic renderings with meaningful motion trajectories. Puppet-Master generalizes well to real images across various categories and outperforms existing methods in a zero-shot manner on a real-world benchmark. See our project page for more results: vgg-puppetmaster.github.io.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)